-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel][Metrics][PR#2] Add SnapshotReport for reporting snapshot construction #3903
Conversation
e1dbe30
to
ceb5a56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Reviewed the production code, not the tests yet
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotContext.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotContext.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotContext.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/TransactionBuilderImpl.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotReportImpl.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotReportImpl.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/snapshot/SnapshotManager.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/snapshot/SnapshotManager.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/metrics/SnapshotMetricsResult.java
Outdated
Show resolved
Hide resolved
ceb5a56
to
db5c16c
Compare
db5c16c
to
8e1f651
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just posting what we discussed in a sync! Looks great!
-
Would be great to use factory methods to create
SnapshotReportImpl
s. One for success/normal cases, one for error case. This should take in theSnapshotQueryContext
. -
Ensure you have the same exact method / value names between SnapshotMetrics and SnapshotMetricsResult. Also, add docs to both that explain the relationship. i.e. SnapshotMetrics creates SnapshotMetricsResult; SnapshotMetricsResult is created by SnapshotMetrics
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/MetricsReportSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/MetricsReportSuite.scala
Show resolved
Hide resolved
class MetricsReportSuite extends AnyFunSuite with TestUtils { | ||
|
||
/////////////////////////// | ||
// SnapshotReport tests // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many different types of reports do you anticipate we will have? If test + helper code for just SnapshotReport is ~300 LOC, I wonder if we should split this up into different files?
e.g. SnapshotMetricsReportSuite extends MetricsReportTestUtils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now this is okay. I think the other report types will have fewer tests (much fewer error cases). If it ends up being a lot in a later PR I can separate the suites then.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/snapshot/SnapshotManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Left some comments and questions. Thanks!
kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableImpl.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotMetrics.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotQueryContext.java
Show resolved
Hide resolved
import io.delta.kernel.metrics.SnapshotReport; | ||
import java.util.Optional; | ||
|
||
/** Stores the context for a given Snapshot query. Used to generate a {@link SnapshotReport} */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is named SnapshotQueryContext
... and you say that it "stores the context" ... but can you define a bit what "context" means? You don't need to list all of the information it contains .. but at high level, what does context mean?
Stores the context for a given Snapshot query. Used to generate a {@link SnapshotReport}.
Here, "context" means the state and information pertaining to when a Snapshot was constructed, for which table, and its associated metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also talk a bit about the expected lifecycle of this class. What is it coupled to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the class docs
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotQueryContext.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotQueryContext.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/metrics/SnapshotReport.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Minor comments mainly on variable/api naming. After this LGTM
kernel/kernel-api/src/main/java/io/delta/kernel/metrics/SnapshotMetricsResult.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotMetrics.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotQueryContext.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/snapshot/SnapshotManager.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
default String operationType() { | ||
return DeltaOperationReport.OperationType.SNAPSHOT.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. If we return String
still I wonder if the enum isn't as useful? what if we returned the enum value instead?
i.e. how do we expect this operationType()
method to be used?
I'm fine with whatever you decide, no need to block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought is that it's useful to go to DeltaOperationType.OperationType
and see all the possible operation types we might encounter. I think it makes sense to be string instead of the enum to try to keep the metric return types to standard built in types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe the answer to that is we don't need the enum and it's possible to just look at all the metrics interfaces we have added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought is that it's useful to go to DeltaOperationReport.OperationType and see all the possible operation types we might encounter.
But there's nothing enforcing this, right? It's just declaring an enum -- children of DeltaOperationReport
don't actually need to add a new type to this enum, they could just return an arbitrary string for String operationType()
.
If we had operationType
return an enum, then it's fair to require and expect that children will add new values for the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe the answer to that is we don't need the enum and it's possible to just look at all the metrics interfaces we have added.
I think the question is: how will the result of operationType
be used? is it just for printing? do we want to check for a particular value (in this case, an enum would be nice)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I think we would probably wouldn't check for a specific value. If you want a snapshot report you would check if it's an instance of SnapshotReport
.
I think i'll remove the enum and keep operationType
to return a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM!
kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/Timer.java
Outdated
Show resolved
Hide resolved
…ricsReporter for the default engine (#3904) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [X] Kernel - [ ] Other (fill in here) ## Description This PR is based off of #3903 See the diff for just this PR [here](https://github.com/delta-io/delta/pull/3904/files/aec95cf3dc0086c37f4c45e2b3e192b7b881768c..678ac473f4de65a8f7fd770696aad2d31a15aef7) Adds a JSON serializer for metrics reports with serialization logic for SnapshotReport. Also adds a `LoggingMetricsReporter` to the default implementation which simply logs the JSON serialized reports using Log4J. ## How was this patch tested? Adds a test suite. ## Does this PR introduce _any_ user-facing changes? No.
Which Delta project/connector is this regarding?
Description
Adds a
SnapshotReport
for reporting snapshot construction.We record a
SnapshotReport
after successfully constructing a snapshot or if an exception is thrown during construction. We useSnapshotContext
to propagate and update information about the snapshot construction. For example, we update the "version" as soon as it's resolved for time-travel by timestamp or load latest snapshot queries. This means in the case of the exception we include as much information as is available.How was this patch tested?
Adds a test suite
MetricsReportSuite
with unit tests.Does this PR introduce any user-facing changes?
No.